98 research outputs found
Finding Mutated Subnetworks Associated with Survival in Cancer
Next-generation sequencing technologies allow the measurement of somatic
mutations in a large number of patients from the same cancer type. One of the
main goals in analyzing these mutations is the identification of mutations
associated with clinical parameters, such as survival time. This goal is
hindered by the genetic heterogeneity of mutations in cancer, due to the fact
that genes and mutations act in the context of pathways. To identify mutations
associated with survival time it is therefore crucial to study mutations in the
context of interaction networks.
In this work we study the problem of identifying subnetworks of a large
gene-gene interaction network that have mutations associated with survival. We
formally define the associated computational problem by using a score for
subnetworks based on the test statistic of the log-rank test, a widely used
statistical test for comparing the survival of two populations. We show that
the computational problem is NP-hard and we propose a novel algorithm, called
Network of Mutations Associated with Survival (NoMAS), to solve it. NoMAS is
based on the color-coding technique, that has been previously used in other
applications to find the highest scoring subnetwork with high probability when
the subnetwork score is additive. In our case the score is not additive;
nonetheless, we prove that under a reasonable model for mutations in cancer
NoMAS does identify the optimal solution with high probability. We test NoMAS
on simulated and cancer data, comparing it to approaches based on single gene
tests and to various greedy approaches. We show that our method does indeed
find the optimal solution and performs better than the other approaches.
Moreover, on two cancer datasets our method identifies subnetworks with
significant association to survival when none of the genes has significant
association with survival when considered in isolation.Comment: This paper was selected for oral presentation at RECOMB 2016 and an
abstract is published in the conference proceeding
Finding the True Frequent Itemsets
Frequent Itemsets (FIs) mining is a fundamental primitive in data mining. It
requires to identify all itemsets appearing in at least a fraction of
a transactional dataset . Often though, the ultimate goal of
mining is not an analysis of the dataset \emph{per se}, but the
understanding of the underlying process that generated it. Specifically, in
many applications is a collection of samples obtained from an
unknown probability distribution on transactions, and by extracting the
FIs in one attempts to infer itemsets that are frequently (i.e.,
with probability at least ) generated by , which we call the True
Frequent Itemsets (TFIs). Due to the inherently stochastic nature of the
generative process, the set of FIs is only a rough approximation of the set of
TFIs, as it often contains a huge number of \emph{false positives}, i.e.,
spurious itemsets that are not among the TFIs. In this work we design and
analyze an algorithm to identify a threshold such that the
collection of itemsets with frequency at least in
contains only TFIs with probability at least , for some
user-specified . Our method uses results from statistical learning
theory involving the (empirical) VC-dimension of the problem at hand. This
allows us to identify almost all the TFIs without including any false positive.
We also experimentally compare our method with the direct mining of
at frequency and with techniques based on widely-used
standard bounds (i.e., the Chernoff bounds) of the binomial distribution, and
show that our algorithm outperforms these methods and achieves even better
results than what is guaranteed by the theoretical analysis.Comment: 13 pages, Extended version of work appeared in SIAM International
Conference on Data Mining, 201
Counterpart semantics for a second-order mu-calculus
We propose a novel approach to the semantics of quantified Îź-calculi, considering models where states are algebras; the evolution relation is given by a counterpart relation (a family of partial homomorphisms), allowing for the creation, deletion, and merging of components; and formulas are interpreted over sets of state assignments (families of substitutions, associating formula variables to state components). Our proposal avoids the limitations of existing approaches, usually enforcing restrictions of the evolution relation: the resulting semantics is a streamlined and intuitively appealing one, yet it is general enough to cover most of the alternative proposals we are aware of
Attention-Based Deep Learning Framework for Human Activity Recognition with User Adaptation
Sensor-based human activity recognition (HAR) requires to predict the action
of a person based on sensor-generated time series data. HAR has attracted major
interest in the past few years, thanks to the large number of applications
enabled by modern ubiquitous computing devices. While several techniques based
on hand-crafted feature engineering have been proposed, the current
state-of-the-art is represented by deep learning architectures that
automatically obtain high level representations and that use recurrent neural
networks (RNNs) to extract temporal dependencies in the input. RNNs have
several limitations, in particular in dealing with long-term dependencies. We
propose a novel deep learning framework, \algname, based on a purely
attention-based mechanism, that overcomes the limitations of the
state-of-the-art. We show that our proposed attention-based architecture is
considerably more powerful than previous approaches, with an average increment,
of more than on the F1 score over the previous best performing model.
Furthermore, we consider the problem of personalizing HAR deep learning models,
which is of great importance in several applications. We propose a simple and
effective transfer-learning based strategy to adapt a model to a specific user,
providing an average increment of on the F1 score on the predictions for
that user. Our extensive experimental evaluation proves the significantly
superior capabilities of our proposed framework over the current
state-of-the-art and the effectiveness of our user adaptation technique.Comment: Accepted for publication on the IEEE Sensors Journa
PRESTO: Simple and Scalable Sampling Techniques for the Rigorous Approximation of Temporal Motif Counts
The identification and counting of small graph patterns, called network
motifs, is a fundamental primitive in the analysis of networks, with
application in various domains, from social networks to neuroscience. Several
techniques have been designed to count the occurrences of motifs in static
networks, with recent work focusing on the computational challenges provided by
large networks. Modern networked datasets contain rich information, such as the
time at which the events modeled by the networks edges happened, which can
provide useful insights into the process modeled by the network. The analysis
of motifs in temporal networks, called temporal motifs, is becoming an
important component in the analysis of modern networked datasets. Several
methods have been recently designed to count the number of instances of
temporal motifs in temporal networks, which is even more challenging than its
counterpart for static networks. Such methods are either exact, and not
applicable to large networks, or approximate, but provide only weak guarantees
on the estimates they produce and do not scale to very large networks. In this
work we present an efficient and scalable algorithm to obtain rigorous
approximations of the count of temporal motifs. Our algorithm is based on a
simple but effective sampling approach, which renders our algorithm practical
for very large datasets. Our extensive experimental evaluation shows that our
algorithm provides estimates of temporal motif counts which are more accurate
than the state-of-the-art sampling algorithms, with significantly lower running
time than exact approaches, enabling the study of temporal motifs, of size
larger than the ones considered in previous works, on billion edges networks.Comment: 19 pages, 5 figures, to appear in SDM 202
Efficient algorithms to discover alterations with complementary functional association in cancer
Recent large cancer studies have measured somatic alterations in an
unprecedented number of tumours. These large datasets allow the identification
of cancer-related sets of genetic alterations by identifying relevant
combinatorial patterns. Among such patterns, mutual exclusivity has been
employed by several recent methods that have shown its effectivenes in
characterizing gene sets associated to cancer. Mutual exclusivity arises
because of the complementarity, at the functional level, of alterations in
genes which are part of a group (e.g., a pathway) performing a given function.
The availability of quantitative target profiles, from genetic perturbations or
from clinical phenotypes, provides additional information that can be leveraged
to improve the identification of cancer related gene sets by discovering groups
with complementary functional associations with such targets.
In this work we study the problem of finding groups of mutually exclusive
alterations associated with a quantitative (functional) target. We propose a
combinatorial formulation for the problem, and prove that the associated
computation problem is computationally hard. We design two algorithms to solve
the problem and implement them in our tool UNCOVER. We provide analytic
evidence of the effectiveness of UNCOVER in finding high-quality solutions and
show experimentally that UNCOVER finds sets of alterations significantly
associated with functional targets in a variety of scenarios. In addition, our
algorithms are much faster than the state-of-the-art, allowing the analysis of
large datasets of thousands of target profiles from cancer cell lines. We show
that on one such dataset from project Achilles our methods identify several
significant gene sets with complementary functional associations with targets.Comment: Accepted at RECOMB 201
Are Graph Convolutional Networks Fully Exploiting Graph Structure?
Graph Convolutional Networks (GCNs) generalize the idea of deep convolutional
networks to graphs, and achieve state-of-the-art results on many graph related
tasks. GCNs rely on the graph structure to define an aggregation strategy where
each node updates its representation by combining information from its
neighbours. In this paper we formalize four levels of structural information
injection, and use them to show that GCNs ignore important long-range
dependencies embedded in the overall topology of a graph. Our proposal includes
a novel regularization technique based on random walks with restart, called
RWRReg, which encourages the network to encode long-range information into the
node embeddings. RWRReg is further supported by our theoretical analysis, which
demonstrates that random walks with restart empower aggregation-based
strategies (i.e., the Weisfeiler-Leman algorithm) with long-range information.
We conduct an extensive experimental analysis studying the change in
performance of several state-of-the-art models given by the four levels of
structural information injection, on both transductive and inductive tasks. The
results show that the lack of long-range structural information greatly affects
performance on all considered models, and that the information extracted by
random walks with restart, and exploited by RWRReg, gives an average accuracy
improvement of more than on all considered tasks
Modelling and analyzing adaptive self-assembling strategies with Maude
Building adaptive systems with predictable emergent behavior is a challenging task and it is becoming a critical need. The research community has accepted the challenge by introducing approaches of various nature: from software architectures, to programming paradigms, to analysis techniques. We recently proposed a conceptual framework for adaptation centered around the role of control data. In this paper we show that it can be naturally realized in a reflective logical language like Maude by using the Reflective Russian Dolls model. Moreover, we exploit this model to specify, validate and analyse a prominent example of adaptive system: robot swarms equipped with self-assembly strategies. The analysis exploits the statistical model checker PVeStA
Adaptation is a Game
Control data variants of game models such as Interface Automata are suitable for the design and analysis of self-adaptive systems
- âŚ